Support Vector Machines, Kernel Logistic Regression and Boosting
نویسندگان
چکیده
The support vector machine is known for its excellent performance in binary classification, i.e., the response y ∈ {−1, 1}, but its appropriate extension to the multi-class case is still an on-going research issue. Another weakness of the SVM is that it only estimates sign[p(x) − 1/2], while the probability p(x) is often of interest itself, where p(x) = P (Y = 1|X = x) is the conditional probability of a point being in class 1 given X = x. We propose a new approach for classification, called the import vector machine, which is built on kernel logistic regression (KLR). We show on some examples that the IVM performs as well as the SVM in binary classification. The IVM can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying class probabilities. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This can give the IVM a computational advantage over the SVM, especially when the size of the training data set is large. We illustrate these techniques on some examples, and make connections with boosting, another popular machine-learning method for classification.
منابع مشابه
A Gradient-based Forward Greedy Algorithm for Sparse Gaussian Process Regression
In this chaper, we present a gradient-based forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new basis vector at each iterative step. This idea was motivated from the well-known gradient boosting approach...
متن کاملA Gradient-Based Forward Greedy Algorithm for Space Gaussian Process Regression
In this chaper, we present a gradient-based forward greedy method for sparse approximation of Bayesian Gaussian Process Regression (GPR) model. Different from previous work, which is mostly based on various basis vector selection strategies, we propose to construct instead of select a new basis vector at each iterative step. This idea was motivated from the well-known gradient boosting approach...
متن کاملBoosting and Bagging of Neural Networks with Applications to Financial Time Series
Boosting and bagging are two techniques for improving the performance of learning algorithms. Both techniques have been successfully used in machine learning to improve the performance of classification algorithms such as decision trees, neural networks. In this paper, we focus on the use of feedforward back propagation neural networks for time series classification problems. We apply boosting ...
متن کاملDo we need hundreds of classifiers to solve real world classification problems?
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regre...
متن کاملMargin Maximizing Loss Functions
Margin maximizing properties play an important role in the analysis of classi£cation models, such as boosting and support vector machines. Margin maximization is theoretically interesting because it facilitates generalization error analysis, and practically interesting because it presents a clear geometric interpretation of the models being built. We formulate and prove a suf£cient condition fo...
متن کامل